class_likelihood_ratios (LR+ / LR-)#
Compute the positive and negative likelihood ratios for a binary classifier.
In scikit-learn this is sklearn.metrics.class_likelihood_ratios.
Learning goals#
Derive (LR_+) and (LR_-) from the confusion matrix
Interpret them as odds multipliers (pre-test (\to) post-test probabilities)
Implement the metric from scratch in NumPy (weights + label ordering)
Visualize how likelihood ratios change with the decision threshold
Use likelihood ratios to pick an operating point (screening vs confirmation)
Prerequisites#
Confusion matrix, sensitivity/specificity
Basic Bayes rule / odds
Logistic regression + ROC curves (helpful, but not required)
import warnings
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
from plotly.subplots import make_subplots
from sklearn.datasets import make_classification
from sklearn.metrics import class_likelihood_ratios, roc_curve
from sklearn.model_selection import train_test_split
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=4, suppress=True)
rng = np.random.default_rng(7)
1) Definition: likelihood ratios as conditional probability ratios#
Treat a classifier’s prediction as a diagnostic test:
test positive (\iff) predict the positive class
test negative (\iff) predict the negative class
The likelihood ratios compare how often the test is positive/negative under each true class:
[ LR_+ = \frac{P(\hat{y}=1 \mid y=1)}{P(\hat{y}=1 \mid y=0)} \qquad LR_- = \frac{P(\hat{y}=0 \mid y=1)}{P(\hat{y}=0 \mid y=0)}. ]
Why this is useful: in odds form Bayes rule becomes a multiplication.
Define odds for a probability (p):
[ \operatorname{odds}(p) = \frac{p}{1-p}. ]
Then the update is:
[ \operatorname{odds}(y=1 \mid \text{test}+) = \operatorname{odds}(y=1)\cdot LR_+, ]
[ \operatorname{odds}(y=1 \mid \text{test}-) = \operatorname{odds}(y=1)\cdot LR_-. ]
Converting odds back to probability:
[ p = \frac{\operatorname{odds}}{1 + \operatorname{odds}}. ]
Equivalently in log-odds:
[ \operatorname{logit}(p_{post}) = \operatorname{logit}(p_{pre}) + \log(LR). ]
Key point: (LR_+) and (LR_-) are functions of sensitivity and specificity (not prevalence), but turning them into post-test probabilities requires a prior (pre-test probability).
2) From confusion matrix to (LR_+) and (LR_-)#
For a binary classifier with positive class (y=1) and negative class (y=0):
[ \begin{array}{c|cc} & \hat{y}=0 & \hat{y}=1\hline y=0 & TN & FP\ y=1 & FN & TP \end{array} ]
Define:
Sensitivity / recall / true positive rate (TPR)
[ \text{TPR} = \frac{TP}{TP+FN} ]
Specificity / true negative rate (TNR)
[ \text{TNR} = \frac{TN}{TN+FP} ]
False positive rate (FPR): (\text{FPR} = 1-\text{TNR} = \frac{FP}{TN+FP})
False negative rate (FNR): (\text{FNR} = 1-\text{TPR} = \frac{FN}{TP+FN})
Then:
[ LR_+ = \frac{\text{TPR}}{\text{FPR}} = \frac{\text{sensitivity}}{1-\text{specificity}} ]
[ LR_- = \frac{\text{FNR}}{\text{TNR}} = \frac{1-\text{sensitivity}}{\text{specificity}}. ]
A commonly used single-number summary is the diagnostic odds ratio:
[ \text{DOR} = \frac{LR_+}{LR_-} = \frac{TP\cdot TN}{FP\cdot FN}, ]
but note it can be undefined/infinite when (FP=0) or (FN=0).
def _infer_binary_labels(y_true, y_pred, labels=None):
y_true = np.asarray(y_true)
y_pred = np.asarray(y_pred)
if labels is None:
labels = np.unique(np.concatenate([np.unique(y_true), np.unique(y_pred)]))
if labels.shape[0] != 2:
raise ValueError(f"Expected 2 labels for binary classification, got {labels!r}")
labels = np.sort(labels) # sklearn default
else:
labels = np.asarray(labels)
if labels.shape[0] != 2:
raise ValueError("labels must be of length 2: [negative_class, positive_class]")
neg_label, pos_label = labels[0], labels[1]
return neg_label, pos_label
def confusion_counts_binary(y_true, y_pred, *, labels=None, sample_weight=None):
'''Return (tp, fp, tn, fn) as floats.'''
y_true = np.asarray(y_true)
y_pred = np.asarray(y_pred)
neg_label, pos_label = _infer_binary_labels(y_true, y_pred, labels=labels)
if sample_weight is None:
w = np.ones_like(y_true, dtype=float)
else:
w = np.asarray(sample_weight, dtype=float)
if w.shape != y_true.shape:
raise ValueError("sample_weight must have shape (n_samples,)")
is_pos_true = y_true == pos_label
is_pos_pred = y_pred == pos_label
tp = np.sum(w * (is_pos_true & is_pos_pred))
fp = np.sum(w * (~is_pos_true & is_pos_pred))
tn = np.sum(w * (~is_pos_true & ~is_pos_pred))
fn = np.sum(w * (is_pos_true & ~is_pos_pred))
return float(tp), float(fp), float(tn), float(fn)
def class_likelihood_ratios_numpy(
y_true,
y_pred,
*,
labels=None,
sample_weight=None,
raise_warning=True,
):
'''NumPy implementation matching sklearn.metrics.class_likelihood_ratios.'''
tp, fp, tn, fn = confusion_counts_binary(
y_true, y_pred, labels=labels, sample_weight=sample_weight
)
pos_total = tp + fn
neg_total = tn + fp
if pos_total == 0 or neg_total == 0:
if raise_warning:
warnings.warn(
"No positive or no negative samples in y_true; likelihood ratios are undefined.",
UserWarning,
)
return (np.nan, np.nan)
tpr = tp / pos_total
fnr = fn / pos_total
fpr = fp / neg_total
tnr = tn / neg_total
lr_plus = np.nan
lr_minus = np.nan
if fpr == 0:
if raise_warning:
warnings.warn("When false positive == 0, the positive likelihood ratio is undefined.")
else:
lr_plus = tpr / fpr
if tnr == 0:
if raise_warning:
warnings.warn("When true negative == 0, the negative likelihood ratio is undefined.")
else:
lr_minus = fnr / tnr
return (lr_plus, lr_minus)
# Quick sanity checks vs scikit-learn
y_true = [0, 1, 0, 1, 0]
y_pred = [1, 1, 0, 0, 0]
print("sklearn:", class_likelihood_ratios(y_true, y_pred))
print("numpy :", class_likelihood_ratios_numpy(y_true, y_pred, raise_warning=False))
y_true = np.array(["non-cat", "cat", "non-cat", "cat", "non-cat"])
y_pred = np.array(["cat", "cat", "non-cat", "non-cat", "non-cat"])
print()
print("Default label order (sorted):")
print("sklearn:", class_likelihood_ratios(y_true, y_pred))
print()
print("Explicit labels=[negative, positive]:")
print("sklearn:", class_likelihood_ratios(y_true, y_pred, labels=["non-cat", "cat"]))
sklearn: (1.5, 0.75)
numpy : (1.5, 0.75)
Default label order (sorted):
sklearn: (1.3333333333333333, 0.6666666666666666)
Explicit labels=[negative, positive]:
sklearn: (1.5, 0.75)
3) Interpretation and common pitfalls#
Valid ranges (for a useful classifier):
(LR_+ \ge 1). Values close to 1 mean “a positive prediction barely changes the odds”.
(0 \le LR_- \le 1). Values close to 1 mean “a negative prediction barely changes the odds”.
If you ever see (LR_+ < 1) or (LR_- > 1), the classifier is often behaving like it has the labels flipped
(or your labels=[negative, positive] ordering is wrong).
Rule-of-thumb strength of evidence (very domain dependent):
Evidence |
(LR_+) |
(LR_-) |
|---|---|---|
small |
2–5 |
0.5–0.2 |
moderate |
5–10 |
0.2–0.1 |
large |
> 10 |
< 0.1 |
Pitfalls
The metric needs hard predictions (class labels). If your model outputs probabilities, you must choose a threshold first.
(LR_+) is undefined when (FP=0) ((\text{FPR}=0)). (LR_-) is undefined when (TN=0) ((\text{TNR}=0)). Small datasets can make this happen easily.
Multi-class problems need a one-vs-rest reduction; scikit-learn’s
class_likelihood_ratiosis binary-only.
def odds(p):
p = np.asarray(p)
return p / (1.0 - p)
def prob_from_odds(o):
o = np.asarray(o)
return o / (1.0 + o)
def update_probability(p_pre, lr):
'''Bayes update in odds form.'''
return prob_from_odds(odds(p_pre) * lr)
p_pre = np.linspace(0.001, 0.999, 400)
lr_plus_values = [2, 5, 10]
lr_minus_values = [0.5, 0.2, 0.1]
fig = make_subplots(
rows=1,
cols=2,
subplot_titles=(
"Post-test probability after a POSITIVE prediction (use LR+)",
"Post-test probability after a NEGATIVE prediction (use LR-)",
),
)
for lr in lr_plus_values:
fig.add_trace(
go.Scatter(x=p_pre, y=update_probability(p_pre, lr), mode="lines", name=f"LR+={lr}"),
row=1,
col=1,
)
for lr in lr_minus_values:
fig.add_trace(
go.Scatter(x=p_pre, y=update_probability(p_pre, lr), mode="lines", name=f"LR-={lr}"),
row=1,
col=2,
)
# Reference line: no change
fig.add_trace(
go.Scatter(x=p_pre, y=p_pre, mode="lines", line=dict(dash="dash"), name="no change"),
row=1,
col=1,
)
fig.add_trace(
go.Scatter(x=p_pre, y=p_pre, mode="lines", line=dict(dash="dash"), showlegend=False),
row=1,
col=2,
)
fig.update_xaxes(title_text="pre-test probability", range=[0, 1], row=1, col=1)
fig.update_xaxes(title_text="pre-test probability", range=[0, 1], row=1, col=2)
fig.update_yaxes(title_text="post-test probability", range=[0, 1], row=1, col=1)
fig.update_yaxes(title_text="post-test probability", range=[0, 1], row=1, col=2)
fig.update_layout(width=1000, height=420)
fig.show()
4) Threshold dependence and ROC geometry#
If your model outputs a score or probability (\hat{p}), you get hard predictions via a threshold (t):
[ \hat{y}(t) = \mathbb{1}[\hat{p} \ge t]. ]
So (LR_+) and (LR_-) are functions of the threshold.
On the ROC plane (x = FPR, y = TPR) for a particular threshold:
(LR_+ = \frac{\text{TPR}}{\text{FPR}}) is the slope of the line from ((0,0)) to the ROC point.
(LR_- = \frac{1-\text{TPR}}{1-\text{FPR}}) is the slope of the line from ((1,1)) to the ROC point.
This makes the metric visually interpretable: to get a large (LR_+) you want a ROC point that is steep above the origin; to get a small (LR_-) you want a point close to the top-left.
# Synthetic 2D dataset for visualization
X, y = make_classification(
n_samples=2200,
n_features=2,
n_redundant=0,
n_informative=2,
n_clusters_per_class=1,
class_sep=1.2,
flip_y=0.05,
random_state=7,
)
X_train_val, X_test, y_train_val, y_test = train_test_split(
X, y, test_size=0.2, stratify=y, random_state=7
)
X_train, X_val, y_train, y_val = train_test_split(
X_train_val, y_train_val, test_size=0.25, stratify=y_train_val, random_state=7
)
# Standardize (helps gradient descent)
mean_ = X_train.mean(axis=0)
std_ = X_train.std(axis=0)
X_train_s = (X_train - mean_) / std_
X_val_s = (X_val - mean_) / std_
X_test_s = (X_test - mean_) / std_
def sigmoid(z):
# Stable sigmoid
z = np.asarray(z)
out = np.empty_like(z, dtype=float)
pos = z >= 0
out[pos] = 1.0 / (1.0 + np.exp(-z[pos]))
ez = np.exp(z[~pos])
out[~pos] = ez / (1.0 + ez)
return out
def fit_logreg_gd(X, y, *, lr=0.15, n_steps=2500, l2=0.01, seed=7):
rng_local = np.random.default_rng(seed)
n, d = X.shape
w = rng_local.normal(scale=0.1, size=d)
b = 0.0
eps = 1e-12
losses = []
for step in range(n_steps):
z = X @ w + b
p = sigmoid(z)
# Binary cross-entropy + L2
loss = -np.mean(y * np.log(p + eps) + (1 - y) * np.log(1 - p + eps)) + 0.5 * l2 * np.sum(w * w)
# Gradients
grad_w = (X.T @ (p - y)) / n + l2 * w
grad_b = np.mean(p - y)
w -= lr * grad_w
b -= lr * grad_b
if step % 25 == 0:
losses.append(loss)
return w, b, np.array(losses)
w, b, losses = fit_logreg_gd(X_train_s, y_train)
fig = go.Figure()
fig.add_trace(go.Scatter(y=losses, mode="lines", name="train loss"))
fig.update_layout(
title="Logistic regression from scratch (gradient descent)",
xaxis_title="checkpoint (every 25 steps)",
yaxis_title="cross-entropy loss",
width=900,
height=380,
)
fig.show()
p_val = sigmoid(X_val_s @ w + b)
# Probability distributions by class (validation set)
df = {
"p_hat": p_val,
"y": y_val.astype(int),
}
fig = px.histogram(
df,
x="p_hat",
color="y",
nbins=50,
opacity=0.6,
barmode="overlay",
histnorm="probability",
title="Predicted probabilities by true class (validation set)",
labels={"p_hat": "predicted P(y=1|x)", "y": "true class"},
)
fig.update_layout(width=900, height=420)
fig.show()
def sweep_thresholds(y_true, y_proba, thresholds):
rows = []
for t in thresholds:
y_pred = (y_proba >= t).astype(int)
tp, fp, tn, fn = confusion_counts_binary(y_true, y_pred, labels=[0, 1])
pos_total = tp + fn
neg_total = tn + fp
tpr = tp / pos_total if pos_total > 0 else np.nan
fnr = fn / pos_total if pos_total > 0 else np.nan
fpr = fp / neg_total if neg_total > 0 else np.nan
tnr = tn / neg_total if neg_total > 0 else np.nan
lr_plus = tpr / fpr if (np.isfinite(fpr) and fpr > 0) else np.nan
lr_minus = fnr / tnr if (np.isfinite(tnr) and tnr > 0) else np.nan
dor = (
lr_plus / lr_minus
if (np.isfinite(lr_plus) and np.isfinite(lr_minus) and lr_plus > 0 and lr_minus > 0)
else np.nan
)
rows.append((t, tp, fp, tn, fn, tpr, tnr, lr_plus, lr_minus, dor))
arr = np.array(rows, dtype=float)
return {
"threshold": arr[:, 0],
"tp": arr[:, 1],
"fp": arr[:, 2],
"tn": arr[:, 3],
"fn": arr[:, 4],
"tpr": arr[:, 5],
"tnr": arr[:, 6],
"lr_plus": arr[:, 7],
"lr_minus": arr[:, 8],
"dor": arr[:, 9],
}
thresholds = np.linspace(0.01, 0.99, 99)
sweep = sweep_thresholds(y_val, p_val, thresholds)
def pick_operating_points(sweep, *, min_sensitivity=0.95, min_specificity=0.95):
thresholds = sweep["threshold"]
sens = sweep["tpr"] # sensitivity
spec = sweep["tnr"] # specificity
lr_plus = sweep["lr_plus"]
lr_minus = sweep["lr_minus"]
# A generic way to combine LR+ and LR- into one objective: diagnostic odds ratio (DOR)
# Fallback: Youden's J = sensitivity + specificity - 1 (always defined as long as rates are defined)
dor = sweep["dor"]
youden_j = sens + spec - 1
if np.any(np.isfinite(dor)):
t_best = thresholds[np.nanargmax(dor)]
best_label = "max DOR"
else:
t_best = thresholds[np.nanargmax(youden_j)]
best_label = "max Youden J (fallback)"
# Screening: prioritize ruling OUT => minimize LR- while keeping sensitivity high
mask_screen = (sens >= min_sensitivity) & np.isfinite(lr_minus)
if mask_screen.any():
t_screen = thresholds[mask_screen][np.nanargmin(lr_minus[mask_screen])]
else:
t_screen = thresholds[np.nanargmin(lr_minus)]
# Confirmation: prioritize ruling IN => maximize LR+ while keeping specificity high
mask_confirm = (spec >= min_specificity) & np.isfinite(lr_plus)
if mask_confirm.any():
t_confirm = thresholds[mask_confirm][np.nanargmax(lr_plus[mask_confirm])]
else:
t_confirm = thresholds[np.nanargmax(lr_plus)]
return t_best, t_screen, t_confirm, best_label
t_best, t_screen, t_confirm, best_label = pick_operating_points(sweep)
(t_best, t_screen, t_confirm, best_label)
(0.46, 0.05, 0.93, 'max DOR')
def _vline(fig, x, *, label, color):
fig.add_vline(x=x, line_width=2, line_dash="dash", line_color=color)
fig.add_annotation(
x=x,
y=1.02,
xref="x",
yref="paper",
text=label,
showarrow=False,
font=dict(color=color),
)
fig = make_subplots(rows=1, cols=2, subplot_titles=("LR+ vs threshold", "LR- vs threshold"))
fig.add_trace(
go.Scatter(x=sweep["threshold"], y=sweep["lr_plus"], mode="lines", name="LR+"),
row=1,
col=1,
)
fig.add_trace(
go.Scatter(x=sweep["threshold"], y=sweep["lr_minus"], mode="lines", name="LR-"),
row=1,
col=2,
)
for x, label, color in [
(t_best, best_label, "#1f77b4"),
(t_screen, "screening", "#2ca02c"),
(t_confirm, "confirm", "#d62728"),
]:
_vline(fig, x, label=label, color=color)
fig.update_yaxes(type="log", row=1, col=1)
fig.update_yaxes(type="log", row=1, col=2)
fig.update_xaxes(title_text="threshold t", row=1, col=1)
fig.update_xaxes(title_text="threshold t", row=1, col=2)
fig.update_yaxes(title_text="LR+ (log scale)", row=1, col=1)
fig.update_yaxes(title_text="LR- (log scale)", row=1, col=2)
fig.update_layout(width=1000, height=420)
fig.show()
# ROC curve (validation set) + geometric interpretation of LR
fpr, tpr, thr = roc_curve(y_val, p_val)
fig = go.Figure()
fig.add_trace(go.Scatter(x=fpr, y=tpr, mode="lines", name="ROC"))
fig.add_trace(
go.Scatter(x=[0, 1], y=[0, 1], mode="lines", line=dict(dash="dash"), name="random")
)
# Get the ROC point closest to our chosen threshold t_best
# (roc_curve returns thresholds in decreasing order)
idx = np.argmin(np.abs(thr - t_best))
x_pt, y_pt = fpr[idx], tpr[idx]
# LR slopes at that operating point
lr_plus = y_pt / x_pt if x_pt > 0 else np.inf
lr_minus = (1 - y_pt) / (1 - x_pt) if (1 - x_pt) > 0 else np.inf
fig.add_trace(
go.Scatter(
x=[x_pt],
y=[y_pt],
mode="markers",
marker=dict(size=10, color="#1f77b4"),
name=f"t≈{t_best:.2f}",
)
)
# Lines showing the slopes
fig.add_trace(
go.Scatter(x=[0, x_pt], y=[0, y_pt], mode="lines", line=dict(color="#1f77b4"), showlegend=False)
)
fig.add_trace(
go.Scatter(x=[1, x_pt], y=[1, y_pt], mode="lines", line=dict(color="#d62728"), showlegend=False)
)
fig.update_layout(
title=f"ROC geometry at t≈{t_best:.2f}: LR+≈{lr_plus:.2f}, LR-≈{lr_minus:.2f}",
xaxis_title="FPR",
yaxis_title="TPR",
width=900,
height=500,
xaxis=dict(range=[0, 1]),
yaxis=dict(range=[0, 1]),
)
fig.show()
def metrics_at_threshold(y_true, y_proba, t):
y_pred = (y_proba >= t).astype(int)
lr_p, lr_m = class_likelihood_ratios_numpy(y_true, y_pred, labels=[0, 1], raise_warning=False)
tp, fp, tn, fn = confusion_counts_binary(y_true, y_pred, labels=[0, 1])
tpr = tp / (tp + fn)
tnr = tn / (tn + fp)
return {
"t": t,
"tp": tp,
"fp": fp,
"tn": tn,
"fn": fn,
"tpr": tpr,
"tnr": tnr,
"lr_plus": lr_p,
"lr_minus": lr_m,
}
m_best = metrics_at_threshold(y_val, p_val, t_best)
m_screen = metrics_at_threshold(y_val, p_val, t_screen)
m_confirm = metrics_at_threshold(y_val, p_val, t_confirm)
m_best, m_screen, m_confirm
({'t': 0.46,
'tp': 203.0,
'fp': 9.0,
'tn': 212.0,
'fn': 16.0,
'tpr': 0.9269406392694064,
'tnr': 0.9592760180995475,
'lr_plus': 22.76154236428209,
'lr_minus': 0.07616093736538296},
{'t': 0.05,
'tp': 218.0,
'fp': 145.0,
'tn': 76.0,
'fn': 1.0,
'tpr': 0.9954337899543378,
'tnr': 0.3438914027149321,
'lr_plus': 1.5171783971028183,
'lr_minus': 0.01327805815909637},
{'t': 0.93,
'tp': 115.0,
'fp': 2.0,
'tn': 219.0,
'fn': 104.0,
'tpr': 0.5251141552511416,
'tnr': 0.9909502262443439,
'lr_plus': 58.02511415525114,
'lr_minus': 0.4792227017785283})
fig = make_subplots(
rows=1,
cols=3,
subplot_titles=(
f"Screening (t={t_screen:.2f})",
f"Max DOR (t={t_best:.2f})",
f"Confirm (t={t_confirm:.2f})",
),
)
for j, m in enumerate([m_screen, m_best, m_confirm], start=1):
cm = np.array([[m["tn"], m["fp"]], [m["fn"], m["tp"]]], dtype=float)
fig.add_trace(
go.Heatmap(
z=cm,
x=["pred 0", "pred 1"],
y=["true 0", "true 1"],
colorscale="Blues",
showscale=False,
text=cm.astype(int),
texttemplate="%{text}",
textfont=dict(size=16),
),
row=1,
col=j,
)
fig.update_layout(
width=1050,
height=420,
title="Confusion matrices at three operating points (validation set)",
)
fig.show()
# How the chosen operating point changes post-test probability
p_pre = 0.10 # example prior/prevalence
for name, m in [("screening", m_screen), ("max_dor", m_best), ("confirm", m_confirm)]:
p_pos = update_probability(p_pre, m["lr_plus"]) # after a positive prediction
p_neg = update_probability(p_pre, m["lr_minus"]) # after a negative prediction
print(
f"{name:9s} t={m['t']:.2f} LR+={m['lr_plus']:.2f} LR-={m['lr_minus']:.2f} "
f"p(y=1|+)= {p_pos:.3f} p(y=1|-)= {p_neg:.3f}"
)
screening t=0.05 LR+=1.52 LR-=0.01 p(y=1|+)= 0.144 p(y=1|-)= 0.001
max_dor t=0.46 LR+=22.76 LR-=0.08 p(y=1|+)= 0.717 p(y=1|-)= 0.008
confirm t=0.93 LR+=58.03 LR-=0.48 p(y=1|+)= 0.866 p(y=1|-)= 0.051
5) Using likelihood ratios to optimize a simple algorithm#
(LR_+) and (LR_-) are defined through counts (TP/FP/TN/FN), so they are not differentiable w.r.t. model parameters.
A common workflow is therefore:
Train a probabilistic model (e.g. logistic regression) using a differentiable loss (cross-entropy)
Use likelihood ratios on a validation set to pick an operating point (decision threshold)
Example strategies:
Screening test (rule out): pick a threshold with high sensitivity and minimal (LR_-)
Confirmatory test (rule in): pick a threshold with high specificity and maximal (LR_+)
Single-number optimization: maximize DOR = (LR_+/LR_-) (useful, but can be unstable if FP or FN are small)
# Final check on a held-out test set using the max-DOR threshold from validation
p_test = sigmoid(X_test_s @ w + b)
y_pred_test = (p_test >= t_best).astype(int)
print("Test set LR (sklearn):", class_likelihood_ratios(y_test, y_pred_test, labels=[0, 1]))
print("Test set LR (numpy) :", class_likelihood_ratios_numpy(y_test, y_pred_test, labels=[0, 1], raise_warning=False))
Test set LR (sklearn): (11.044393708777271, 0.10936410464043908)
Test set LR (numpy) : (11.044393708777271, 0.10936410464043908)
Pros / cons / when to use#
Pros
Interpretable: directly tells you how to update odds (prevalence + test result (\to) posterior)
Uses sensitivity/specificity, so it is more stable across different prevalences than precision/NPV
Naturally supports “rule-in” (large (LR_+)) vs “rule-out” (small (LR_-)) thinking
Cons
Threshold-dependent and based on hard predictions (not a ranking metric like AUC)
Can be undefined/infinite when (FP=0) or (TN=0), especially on small datasets
Binary-only; multi-class needs one-vs-rest and careful reporting
Good fits
Medical diagnostic tests, screening vs confirmation
Any binary decision where base rate/prevalence is known or can be estimated and you need a domain-friendly “odds update” explanation
Exercises#
On a dataset you care about, sweep thresholds and compare:
max (LR_+) at specificity (\ge 0.95)
min (LR_-) at sensitivity (\ge 0.95)
max DOR Do these thresholds match what you would pick using accuracy or F1?
Implement one-vs-rest likelihood ratios for multi-class classification and report the per-class (LR_+) and (LR_-).
References#
scikit-learn:
sklearn.metrics.class_likelihood_ratiosWikipedia: https://en.wikipedia.org/wiki/Likelihood_ratios_in_diagnostic_testing